Instructors
Meetings
Websites
Quiz dates (mark on your calendars now!)
Prerequisites
Please read the syllabus carefully. You are responsible for adhering to course policies at all times, especially the academic code of conduct, plagiarism, AI, and collaboration policies.
The syllabus can be downloaded from a link at the top of the Piazza course information page. These lecture notes contain an abbreviated version of the syllabus, and should not be considered as a substitute for the syllabus itself.
CDS DS 453 investigates techniques for performing trustworthy data analyses without a trusted party, and for conducting data science without data sharing.
The first half of the course investigates cryptocurrencies, the blockchain technology underpinning them, and the incentives for each participant. Students will learn how to create transactions, develop smart contracts, and participate in decentralized exchanges. Then, we take a deeper dive into consensus mechanisms, historical and modern, that maintain stability if a certain fraction of the participants or computing power behaves honestly.
The second half of the course focuses on privacy and anonymity using advanced tools from cryptography. We study zero knowledge proofs and their role in preventing re-identification attacks and increasing scalability of blockchains. We also study secure multiparty computation and its role in designing private contracts and atomic swaps. The course concludes with a broader exploration into the power of conducting data science without being able to see the underlying data.
Within the undergraduate Data Sciences major, this course satisfies the DS methodology elective in the “scalable & trustworthy DS” category.
In spring 2024, this course does not satisfy any Hub units.
There are five parts to this course, each with their own learning objectives.
The full list of 14 learning objectives is available in the syllabus; roughly, they correspond to the weeks of the course.
The full course schedule is available on the Piazza Course Schedule page. It contains the required textbook reading, lecture notes/video, and assignments due each week. The schedule might change slightly over time, and I welcome suggestions from you about topics that you want to see covered in this course.
No prior background in cryptography is necessary.
This course requires mathematical maturity with linear algebra and probability concepts at the level of DS 122 or equivalent.
It also requires familiarity with algorithms and big-O notation at the level of DS 320, CS 330, or equivalent.
If you do not meet the prerequisites of this course, please talk with me after today's lecture.
There are 2 websites for this course, Piazza and Gradescope.
The course schedule and homework assignments will be posted on Piazza at https://piazza.com/bu/spring2024/ds453653. You are responsible for any announcements on Piazza, so please register using an email address that you check regularly. (Also, only email the instructors for a personal or administrative manner; all scientific questions must be sent via Piazza.)
Completed assignments must be submitted on Gradescope; this is the only way to receive credit. Please register at https://www.gradescope.com/courses/710247 using entry code B2Y7G5.
A typical week of this course contains:
There are four types of graded assignments. The grading system is likely different than what you're used to in most courses. The main rules-of-thumb to remember for everything in this course is:
You must read a portion of a textbook every week. The required reading is listed on the Piazza Course Schedule page.
To assess your understanding of the material, we will post 5 multiple-choice or short-answer questions on Gradescope each week.
Programming assignments are due in most weeks of the course, but there are intentional gaps around the times of quizzes and project due dates to allow you time to work on those. Each assignment contains questions collectively worth 5 points.
There are 4 quizzes in this course, one for each of Parts 1 through 4 of the course. (There is no quiz for Part 5; it is evaluated instead in the assessments.)
Quizzes occur either during the discussion or lecture sections; the exact quiz dates are stated at the top of the syllabus and posted on the Piazza Course Schedule page.
Each quiz contains 5 questions. Your responses are graded on correctness and clarity. You will receive credit for a response that meets the stated expectations of the question, sufficiently explains and demonstrates understanding of the concepts, and does not contain any significant gaps, errors, or omissions.
There are two open-ended projects in this course.
DS 653 students must complete both projects. DS 453 students only need to complete one project of their choice.
We will provide more details about the projects later in the semester. You will have 3-4 weeks to work on each project. As with everything else in this course:
Your base grade in this course is based on passing a sufficient number of reading assessments, programming assignments, quizzes, and projects.
For DS 453 students: to receive a course grade of A, B, C, or D, you must complete all stated requirements below. In other words, the sentences below have four clauses, and you need to satisfy the conjunction of their requirements.
For DS 653 students: to obtain a base grade of A or B, you must pass 2 projects.
This base grade can be increased as follows:
At any point in the course: you should know where you stand, and what additional work you need to perform in order to reach each grade level. The syllabus contains a sheet that you can use to track your progress in the course.
You are responsible for adhering to course policies at all times! If you have questions about the policies, feel free to ask me at any time -- for instance, before/after lecture, at office hours, or on Piazza.
You must read and adhere to BU’s Academic Code of Conduct, which is available here: https://www.bu.edu/academics/policies/academic-conduct-code/. All work in this course is subject to BU’s Academic Code, so please familiarize yourself with this code, its definitions of misconduct and plagiarism, and its sanctions.
All written work in this course must be original to you. If you consult outside texts, or other forms of assistance, cite these sources in the proper format—at a minimum, include the author, title, and website link for all external sources (books, journals, lectures, web sites, AI). We are required to report all suspected cases of plagiarism to the Academic Dean for review.
Academic integrity in computing coursework has some special aspects. Please review the examples of plagiarism as provided by the BU Computer Science department: https://www.bu.edu/cs/undergraduate/undergraduate-life/academic-integrity/.
All submitted work in this course must conform to the CDS Generative AI Assistance Policy, which you can read at https://www.bu.edu/cds-faculty/culture-community/gaia-policy/. Also, keep in mind that AI tools are often wrong!
The goal of homework assignments and projects is to learn. Hence, I encourage you to use any and all resources that can help you to learn the material: computers/calculators, Piazza, lecture notes, textbooks, other websites, and your fellow classmates. That said,
By contrast, the goal of the quizzes is for you to show me what you have learned. So, any form of collaboration is strictly prohibited. Also, written notes and electronic aids (e.g., computers or calculators) are all forbidden from use during quizzes. (That said, I encourage you to collaborate with classmates when studying lecture materials and preparing for the quizzes.)
Violations of the conduct code and collaboration policy will result in receiving a score of N (not passed) on the assignment or quiz without the possibility to improve the grade, and may be grounds for referral to BU’s Academic Conduct Committee.
If you have any questions or concerns about the conduct code and collaboration policy, then I recommend that you ask me (in person or via private Piazza note) before taking an action that might be a violation. Additionally, at the bottom of each assignment and project, you must list:
This is your opportunity to document and explain any collaborators or sources used and why you believe it adheres to the code of conduct and collaboration policy.
If I discover a violation of the policy that you have documented, then I will not refer the case to BU’s Academic Conduct Committee; the worst possible recourse is that I will ask you to redo the assignment (or create a new, similar assignment for you to do in its place). If I discover an undocumented violation after the fact, then this will be considered academic misconduct.
BU strives to be accessible, inclusive, and diverse in our facilities, programming, and academic offerings. Your experience in this course is important to the teaching staff.
If you have a disability or believe that you require a reasonable accommodation, please meet with BU Disability and Access Services as soon as possible at the beginning of the semester. Their office is at 25 Buick Street, Suite 300, and they can be contacted at 617-353-3658. Requests for accommodations are sent by that office to the Academic Dean who approves and returns them. Disability and Access Services then forwards them to the instructor.
If you have an accommodation letter, please send it (and only it) to me via email early in the semester, before tasks become due.
This course seeks to be inclusive of people of all genders, races, cultures, abilities, and sexual orientations. Please respect your fellow classmates and contribute toward a positive learning environment for everyone.
While I actively encourage discussion and debate on ideas, I won’t tolerate criticism of other people. Also, while you can use a computer for note-taking, you may not use a laptop or cellular phone in class for web surfing, sending messages, or anything else that can cause a distraction to your fellow classmates. Anyone who causes a disruption to the learning environment will be asked to leave.
This course follows BU’s policy on absences for religious observance. Otherwise, students should attend the lectures and discussion labs, either in person or virtually via Zoom.
Due to ongoing public health concerns: if you feel sick, please err on the side of caution and attend via Zoom or review the lecture notes and video afterward on Piazza.
You are responsible for submitting reading assessments, programming assignments, and projects electronically on Gradescope by the stated due date and time (typically, Thursday at 8pm).
Up to three times in the semester, you can submit a reading assessment, programming assignment, or project up to 3 days late (e.g., by Sunday at 8pm).
Any subsequent late assignments are not accepted (and any such auto-graded scores will be manually discarded after the fact), except in cases of long-term emergencies (e.g., family, medical); if this applies to you, inform me as soon as you are able.
If you have a valid conflict with a quiz, you must send me a written note as soon as you are aware, and with a minimum of 2 weeks notice (barring extenuating circumstances, e.g., illness).
The final exam period (i.e., the time of the final chance to take each quiz) can only be rescheduled in accordance with the university policy: https://www.bu.edu/reg/calendars/final-exams/policy/.
You have the right to request a regrade of any project or quiz question. All regrade requests must be submitted via Gradescope, and must describe a factual error in our assessment. If you request a regrade for one question on a quiz or one part of a project, then we have the right to review the entire quiz or project. Beware that this may potentially result in a lower grade.
The reading assessments and programming assignments are auto-graded, so in general we do not anticipate regrades. That said, if you discover an error in our auto-grading system, please send us a Piazza note explaining the issue and we will look into it.
“Cryptography is how people get things done when they need one another, don’t fully trust one another, and have adversaries actively trying to screw things up.”
– Ben Adida
This is a course about crypto... which is admittedly a term with many different meanings to different people. It's also an abbreviation with (at least) three different endings.
We will get to the topic of cryptocurrencies in a few minutes. But first, here is a question to start us off in this course: what is cryptology?
Cryptology comes from two Greek word parts:
So the goal of cryptology is to study the art of keeping secrets.
In its modern form, crypto uses hard math problems as a way to encode data in order to restrict who can use it, how they can use it, and sometimes even when and where they can access it.
Cryptology can be divided into two sub-fields
Both cryptography and cryptanalysis are challenging, multi-disciplinary tasks. They also reinforce each other, due to Schneier's law.
"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break."
– Bruce Schneier
In this course we will almost exclusively focus on the defensive, cryptographic side. Even though I won't say much about cryptanalysis in this course, it is important to know that all of the building blocks I will show you in today's lecture have been vetted through decades of cryptanalysts, collectively spanning many centuries or millenia of people-years. No single one of us can compete with that.
"Don't roll your own crypto."
– (Every cryptographer)
Crypto is a scientific field at the intersection of many disciplines. To accomplish their goals, cryptographers use:
This class will primarily make use of the first two skills: algorithms and software engineering.
While we won't delve into the specific type of math used in cryptanalysis, nevertheless we will often use concepts from algebra and probability.
This is also a course about how crypto can be used within data science.
It's likely that you have already used crypto in all of your data science projects, perhaps without explicitly thinking about it.
In this image, the lock icon in the web browser means that my computer has a protected connection to the kaggle.com servers.
At a high level, this means that nobody else on the internet can
(Note though that the kaggle.com servers can see what you are doing on their website. We will see later in the semester how even this can potentially be avoided.)
Using crypto is an essential part of providing security over the Internet, because it has always been designed as an open network. There is no wire that directly connects your computer with kaggle.com. Instead, the data flows through several other computers along the way.
Here is a picture of all of the computer servers connected to the Internet in 1968.
Source: pwnallthethings on Twitter
But the design of the Internet has fundamentally remained the same.
“The Internet is just the world passing notes in a classroom.”
– Jon Stewart
Getting the crypto right is essential here.
If the system is vulnerable to decoding or tampering by unauthorized parties, then it can lead to universal, covert breaches of security. Moreover, everyone will have a false sense of security.
If the system is too onerous or slow to use, then people will ignore it and move to other, less secure options.
We will see throughout this course that crypto can be used in many ways besides network transmissions on the Internet.
Still though, this example highlights a general theme of all of our applications:
crypto is a social science that masquerades as a mathematical science.
Our goal is to use the tools from math, computer science, and data science in order to design systems that help people.
At a high level, cryptographic systems attempt to provide (some or all of) the following three objectives.
These three security objectives are sometimes referred to as the "CIA triad."
Admittedly, these three objectives are both underspecified (lacking mathematical rigor) and overloaded (the terms have many different sub-meanings).
There are many different flavors of confidentiality, integrity, and availability that cryptography can provide... in fact, far more than we have time to cover in this course.
Source: Handbook of Applied Cryptography, page 4
(Note: BU courses CS 538 and CS 558 also cover different aspects of cryptography and its application in modern digital systems. This course has a partial overlap with them, but we will cover many topics that they don't and vice-versa. You are welcome to register for those courses as well, but they are not required knowledge.)
Because crypto is a scientific field that has social impacts, it is probably inevitable that crypto has clashed with law and policy over the past five decades.
"Cryptography rearranges power: it configures who can do what, from what. This makes cryptography an inherently political tool, and it confers on the field an intrinsically moral dimension."
– Phillip Rogaway
We will see aspects of the two-way connection between crypto and policy throughout the semester, and we will spend the last few weeks focusing exclusively on it.
[Note: This section of the lecture is based on Johns Hopkins CS 601.641 by Prof. Abishek Jain.]
Pausing the cryptography discussion for a moment, let's provide an overview of cryptocurrencies like Bitcoin and Ethereum. We will spend the next few weeks delving into their design.
From the start, it is worthwhile to state what this course is NOT about.
⚠ This is not a finance course on cryptocurrencies. You should not expect to be taught how to invest in cryptocurrencies or how to become a billionaire overnight.
In fact, the cryptocurrency market is still in its infancy and is incredibly volatile. Investing in cryptocurrencies is a good way to lose money. So this course doesn't offer any investment advice, except in this sentence: I advise you not to invest in cryptocurrencies.
Still though, cryptocurrencies are a worthy topic of study in order to understand their capabilities, potential, and limitations.
In the first unit of this course, we will explore the following topics about cryptocurrencies.
Let's start at the top of the list. What is a blockchain?
It is:
Blockchains are a data structure to keep track of transactions.
Here is the motivating scenario for this unit: there are two people, who (following the typical convention in crypto) we will call Alice and Bob.
Alice would like to transfer $1 to Bob.
Let's think about
| Alice | Bob | |
|---|---|---|
| $\longrightarrow$ |
Let's start by exploring how money transfers work in the world of physical banking. Suppose Alice writes a check to Bob. What happens?
If everyone is honest, the money is transferred because a bank performs the bookkeeping task of recording every account and its balance.
Source: Wikipedia
Now let's think about what happens if each party is malicious. For this single transaction, Alice doesn't really care whether Bob is malicious or not; she isn't expecting to receive anything of value.
In the other direction: even if Bob trusts the bank (for now), he may be suspicious that Alice is trying to deceive him. He needs to be able to verify the following three properties.
Alice properly signed the check: To accomplish this goal, Bob can verify the signature himself, say by comparing the check against Alice's identification card.
Alice possesses $1 in her bank account: Bob can ask the bank to verify that Alice's account balance is sufficient for the check to clear. Note that Bob doesn't even need to learn Alice's balance, just the result of the binary predicate.
Alice does not double spend the money by writing a check to someone else at the same time: This is the toughest property to achieve, because neither Bob nor the bank know yet whether Alice has overdrafted her account. They will only discover this later; if so, Alice and Bob can use the bank and the legal system to settle their dispute.
By contrast, the digital world is diverse and international. There is no single entity that everyone in the world trusts, nor is there a single legal system to identify and prosecute thieves and money launderers.
So, if we want a digital system of money, then we need a different approach altogether.
It will take us a few lectures to build up the tools for a digital system of transferring money. First, we need to understand two essential crypto tools: digital signatures and hash functions.
Today, we only focus on property #1: building a digital version of signing a check.
A digital signature allows Alice to append a signature $\sigma$ to a message $M$ that satisfies two properties.
Also, there are (at least) two properties that digital signatures do not achieve on their own.
Receiver authentication: Because a signature can be verified by anyone, on its own it will not tell Bob that he was the intended target of the message. If Alice wants to convey that Bob is the intended target, she should do so in the message contents $M$.
Thwarting replay attacks: If Bob has a digital signature, he can show it to others (e.g., a bank) as many times as he wants. To solve this issue, we will have to be clever in thinking about the format of the message $M$ -- for instance, perhaps it should contain a unique serial number, so the bank can detect if the same check is being cashed twice.